This exercise aims to to explain factors affecting the resale prices of public housing in Singapore by building hedonic pricing model using appropriate Geographical Weighted Regression (GWR) methods.
This exercise aims to investigate factors affecting the resale prices of four-room public housing in Singapore by building hedonic pricing models. The hedonic price models will be built using appropriate Geographical Weighted Regression (GWR) methods and by the end of this exercise, the following will be covered:
This section covers installing the applicable R packages as well as importing the necessary data for analysis
The following R packages will be used in this analysis:
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'onemapsgapi', 'dplyr', 'httr', 'rjson')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The table below shows all the data that will be imported for this analysis
| Data Type | Name | Source |
|---|---|---|
| Geospatial | Master Plan 2014 Subzone Boundary (Web) | link |
| Aspatial | HDB Resale Prices | link |
| Geospatial | Childcare | OnemapAPI |
| Geospatial | Hawker | OnemapAPI |
| Geospatial | Supermarket | OnemapAPI |
| Geospatial | MRT & LRT Locations | link |
DATA TALKING POINTS:
Master Plan 2014 Subzone Boundary
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Niharika-avula\IS415_blog\_posts\Take Home Exercise- 3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
SUPERMARKET
Importing this data from OnemapAPI using the code chunk below
supermarket <- st_read("data/geospatial/supermarkets-geojson.geojson")%>%
mutate(lat = st_coordinates(.)[,2], lng = st_coordinates(.)[,1])%>%
st_transform(crs = 3414)
Reading layer `supermarkets-geojson' from data source
`C:\Niharika-avula\IS415_blog\_posts\Take Home Exercise- 3\data\geospatial\supermarkets-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
glimpse(supermarket)
Rows: 526
Columns: 5
$ Name <chr> "kml_1", "kml_2", "kml_3", "kml_4", "kml_5", "km~
$ Description <chr> "<center><table><tr><th colspan='2' align='cente~
$ geometry <POINT [m]> POINT Z (35561.22 42685.17 0), POINT Z (32~
$ lat <dbl> 1.402303, 1.314239, 1.373321, 1.332959, 1.353453~
$ lng <dbl> 103.9013, 103.8709, 103.8864, 103.9149, 103.9530~
HAWKER CENTRES
hawker <- st_read("data/geospatial/hawker-centres-geojson.geojson")%>%
mutate(lat = st_coordinates(.)[,2], lng = st_coordinates(.)[,1])%>%
st_transform(crs = 3414)
Reading layer `hawker-centres-geojson' from data source
`C:\Niharika-avula\IS415_blog\_posts\Take Home Exercise- 3\data\geospatial\hawker-centres-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
glimpse(hawker)
Rows: 125
Columns: 5
$ Name <chr> "kml_1", "kml_2", "kml_3", "kml_4", "kml_5", "km~
$ Description <chr> "<center><table><tr><th colspan='2' align='cente~
$ geometry <POINT [m]> POINT Z (39731.49 34910.13 0), POINT Z (26~
$ lat <dbl> 1.331987, 1.287331, 1.372385, 1.363157, 1.352007~
$ lng <dbl> 103.9387, 103.8183, 103.8290, 103.8667, 103.8370~
CHILDCARE
childcare <- st_read("data/geospatial/child-care-services-geojson.geojson")%>%
mutate(lat = st_coordinates(.)[,2], lng = st_coordinates(.)[,1])%>%
st_transform(crs = 3414)
Reading layer `child-care-services-geojson' from data source
`C:\Niharika-avula\IS415_blog\_posts\Take Home Exercise- 3\data\geospatial\child-care-services-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 1545 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
glimpse(childcare)
Rows: 1,545
Columns: 5
$ Name <chr> "kml_1", "kml_2", "kml_3", "kml_4", "kml_5", "km~
$ Description <chr> "<center><table><tr><th colspan='2' align='cente~
$ geometry <POINT [m]> POINT Z (27976.73 45716.7 0), POINT Z (258~
$ lat <dbl> 1.429720, 1.286680, 1.354655, 1.386540, 1.319731~
$ lng <dbl> 103.8331, 103.8138, 103.8639, 103.8447, 103.9521~
MRT & LRT
MRT_LRT <- st_read(dsn = "data/geospatial",
layer = "MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\Niharika-avula\IS415_blog\_posts\Take Home Exercise- 3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 171 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
HDB_resale <- read_csv("data/aspatial/HDB resale-flat-prices.csv", show_col_types=FALSE)
We have come to end of Section 2 with R packages installed and the Geospatial and Aspatial data imported, next section will cover data wrangling process of the imported data
This section covers all the steps taken in the pre-processing of the mpsz data, which includes the following steps:
Took reference from senior’s project given as sample for this section
Checking for missing values as they can impact future calculations and visualisations.
Master Plan 2014 Subzone Boundary
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
There are no missing values in both mpsz data
SUPERMARKET
Simple feature collection with 0 features and 4 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] Name Description geometry lat lng
<0 rows> (or 0-length row.names)
There are no missing values in both supermarket data
HAWKER CENTRES
Simple feature collection with 0 features and 4 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] Name Description geometry lat lng
<0 rows> (or 0-length row.names)
There are no missing values in both hawker data
CHILDCARE
Simple feature collection with 0 features and 4 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] Name Description geometry lat lng
<0 rows> (or 0-length row.names)
There are no missing values in both childcare data
MRT & LRT
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
There are no missing values in both MRT_LRT data
Checking for invalid geometries as they can impact future calculations and visualisations.
Master Plan 2014 Subzone Boundary
It can be seen that there are 9 invalid geometries in the mpsz data, hence the invalid geometries need to be made valid
Invalid geometries have been handled, there are no more invalid geometries in above dataset.
SUPERMARKET
It can be seen that there are no invalid geometries in the supermarket data
HAWKER CENTRES
It can be seen that there are no invalid geometries in the hawker data
CHILDCARE
It can be seen that there are no invalid geometries in the childcare data
MRT & LRTIt can be seen that there are no invalid geometries in the MRT_LRT data
Master Plan 2014 Subzone Boundary
st_crs(mpsz)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
It can be seen that while the projected CRS is SVY21, the current EPSG Code is 9001, hence the next step is to assign the correct 3414 EPSG code
mpsz <- st_set_crs(mpsz, 3414)
st_crs(mpsz)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
SUPERMARKET
st_crs(supermarket)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The correct projected CRS with EPSG Code 3413 is assigned
HAWKER CENTRES
st_crs(hawker)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The correct projected CRS with EPSG Code 3413 is assigned
CHILDCARE
st_crs(childcare)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The correct projected CRS with EPSG Code 3413 is assigned
MRT & LRT
st_crs(MRT_LRT)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
It can be seen that while the projected CRS is SVY21, the current EPSG Code is 9001, hence the next step is to assign the correct 3414 EPSG code
MRT_LRT <- st_set_crs(MRT_LRT, 3414)
st_crs(MRT_LRT)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
The correct projected CRS with EPSG Code 3413 is assigned
VISUALISING GEOSPATIAL DATA
Before we jump into the analysis, it is a good practice to visualise the geospatial data
tmap_mode("view")
tm_shape(mpsz) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(MRT_LRT) +
tm_dots(col = 'red', size = 0.02) +
tm_shape(supermarket) +
tm_dots(col = 'black', size = 0.02) +
tm_shape(childcare) +
tm_dots(col = 'green', size = 0.02) +
tm_shape(hawker) +
tm_dots(col = 'blue', size = 0.02)
tmap_mode("plot")
This section covers all the steps taken to extract the data based on focus area, convert the street name and block number to derive the postal code followed by the pre-processing of the HDB_resale data, which includes the following steps:
Let’s have a glimpse at the data first
glimpse(HDB_resale)
Rows: 111,706
Columns: 11
$ month <chr> "2017-01", "2017-01", "2017-01", "2017-0~
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO"~
$ flat_type <chr> "2 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", ~
$ block <chr> "406", "108", "602", "465", "601", "150"~
$ street_name <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 4",~
$ storey_range <chr> "10 TO 12", "01 TO 03", "01 TO 03", "04 ~
$ floor_area_sqm <dbl> 44, 67, 67, 68, 67, 68, 68, 67, 68, 67, ~
$ flat_model <chr> "Improved", "New Generation", "New Gener~
$ lease_commence_date <dbl> 1979, 1978, 1980, 1980, 1980, 1981, 1979~
$ remaining_lease <chr> "61 years 04 months", "60 years 07 month~
$ resale_price <dbl> 232000, 250000, 262000, 265000, 265000, ~
It can be seen that there is no geometry data and the address is given as street name and block number instead, therefore onemapAPI will be used to derive the postal code and geometry data, but before that let’s extract the data of four-room houses within the specific period.
As we are only interested in four-room houses transacted from 1st January 2019 to 30th September 2020, we have to extract this from the HDB_resale data which contains infromation since 2017 and for all flat types
HDB_resale_four_room <- filter(HDB_resale, flat_type == '4 ROOM')
glimpse(HDB_resale_four_room)
Rows: 46,467
Columns: 11
$ month <chr> "2017-01", "2017-01", "2017-01", "2017-0~
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO"~
$ flat_type <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", ~
$ block <chr> "472", "475", "629", "546", "131", "254"~
$ street_name <chr> "ANG MO KIO AVE 10", "ANG MO KIO AVE 10"~
$ storey_range <chr> "10 TO 12", "07 TO 09", "01 TO 03", "01 ~
$ floor_area_sqm <dbl> 92, 91, 94, 92, 98, 97, 92, 91, 92, 97, ~
$ flat_model <chr> "New Generation", "New Generation", "New~
$ lease_commence_date <dbl> 1979, 1979, 1981, 1981, 1979, 1977, 1979~
$ remaining_lease <chr> "61 years 06 months", "61 years 06 month~
$ resale_price <dbl> 400000, 400000, 403000, 410000, 425888, ~
It can be seen that the flat-type is filtered to contain only 4 Room flats, next step is to further filter based on the transaction period but the data type of month column is in character.
dates_to_include <- c('2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12', '2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08', '2020-09')
HDB_resale_filtered <- filter(HDB_resale_four_room, month %in% dates_to_include)
glimpse(HDB_resale_filtered)
Rows: 15,901
Columns: 11
$ month <chr> "2019-01", "2019-01", "2019-01", "2019-0~
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO"~
$ flat_type <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", ~
$ block <chr> "204", "175", "543", "118", "411", "546"~
$ street_name <chr> "ANG MO KIO AVE 3", "ANG MO KIO AVE 4", ~
$ storey_range <chr> "01 TO 03", "07 TO 09", "01 TO 03", "04 ~
$ floor_area_sqm <dbl> 92, 91, 92, 99, 92, 92, 92, 92, 93, 91, ~
$ flat_model <chr> "New Generation", "New Generation", "New~
$ lease_commence_date <dbl> 1977, 1981, 1981, 1978, 1979, 1981, 1978~
$ remaining_lease <chr> "57 years", "61 years 06 months", "61 ye~
$ resale_price <dbl> 330000, 360000, 370000, 375000, 380000, ~
With this, the subset data we are interested in is extracted for further analysis and has 15901 rows
As a first step, a new column will be created with the block and street name combined and this columns will be used to search for the geometry data from onemapAPI
library(stringr)
HDB_resale_filtered$block_street <- str_c(HDB_resale_filtered$block, ' ', HDB_resale_filtered$street_name)
HDB_resale_filtered$block_street <- gsub(" ", "+", HDB_resale_filtered$block_street, fixed = TRUE)
glimpse(HDB_resale_filtered)
Rows: 15,901
Columns: 12
$ month <chr> "2019-01", "2019-01", "2019-01", "2019-0~
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO"~
$ flat_type <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", ~
$ block <chr> "204", "175", "543", "118", "411", "546"~
$ street_name <chr> "ANG MO KIO AVE 3", "ANG MO KIO AVE 4", ~
$ storey_range <chr> "01 TO 03", "07 TO 09", "01 TO 03", "04 ~
$ floor_area_sqm <dbl> 92, 91, 92, 99, 92, 92, 92, 92, 93, 91, ~
$ flat_model <chr> "New Generation", "New Generation", "New~
$ lease_commence_date <dbl> 1977, 1981, 1981, 1978, 1979, 1981, 1978~
$ remaining_lease <chr> "57 years", "61 years 06 months", "61 ye~
$ resale_price <dbl> 330000, 360000, 370000, 375000, 380000, ~
$ block_street <chr> "204+ANG+MO+KIO+AVE+3", "175+ANG+MO+KIO+~
The following function is written to extract the geometry data (Source: Link)
library(onemapsgapi)
library(httr)
getcoordinates <- function(address){
query_string <-str_c('https://developers.onemap.sg/commonapi/search?searchVal=', toString(address), '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
jsonresp <- GET(query_string)
resp <- content(jsonresp, as="parsed")
if (length(resp$results) != 0){
resp_df <- resp$results
resp_df_final <- resp_df%>%bind_rows%>%select(BLK_NO, ROAD_NAME, POSTAL, X, Y, LATITUDE, LONGTITUDE)
#print(resp_df_final)
}
else{
resp_df_final <- 0
}
return(resp_df_final)
}
Using for loop, all 15901 addresses are passed to extract the geometry data
HDB_resale_filtered$Lat <- 0
HDB_resale_filtered$Lng <- 0
for (x in HDB_resale_filtered$block_street) {
index <- match(x, HDB_resale_filtered$block_street)
F_results <- as.list(getcoordinates(x))
if(length(F_results$LATITUDE)!=0){
HDB_resale_filtered$Lat[index] <- F_results$LATITUDE
}
else
{
HDB_resale_filtered$Lat[index] <- NA
}
if(length(F_results$LONGTITUDE)!=0){
HDB_resale_filtered$Lng[index] <- F_results$LONGTITUDE
}
else
{
HDB_resale_filtered$Lng[index] <- NA
}
}
Displaying the dataset
head(HDB_resale_filtered)
# A tibble: 6 x 14
month town flat_type block street_name storey_range floor_area_sqm
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2019-01 ANG MO KIO 4 ROOM 204 ANG MO KIO~ 01 TO 03 92
2 2019-01 ANG MO KIO 4 ROOM 175 ANG MO KIO~ 07 TO 09 91
3 2019-01 ANG MO KIO 4 ROOM 543 ANG MO KIO~ 01 TO 03 92
4 2019-01 ANG MO KIO 4 ROOM 118 ANG MO KIO~ 04 TO 06 99
5 2019-01 ANG MO KIO 4 ROOM 411 ANG MO KIO~ 04 TO 06 92
6 2019-01 ANG MO KIO 4 ROOM 546 ANG MO KIO~ 10 TO 12 92
# ... with 7 more variables: flat_model <chr>,
# lease_commence_date <dbl>, remaining_lease <chr>,
# resale_price <dbl>, block_street <chr>, Lat <chr>, Lng <chr>
Lat and Lng values are appended correctly and now we can move on to the next step of wrangling process to convert the CRS and handle any missing values
Therefore, there are 13 missing values which could be not obtained due to API error, so we can proceed to remove them.
HDB_resale_filtered_sf <- st_as_sf(HDB_resale_filtered,
coords = c("Lng",
"Lat"),
crs=4326) %>%
st_transform(crs = 3414)
st_crs(HDB_resale_filtered_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Done, it is correctly assigned now.
The locational factors used in this analysis are as below which will be calculated in this section:
Deriving Proximity AND frequency count of locational factors
FOR PROXIMITY MEASURE: Instead of using the Onemapsgapi wrapper as suggested due to its limitations such as high time consumption and loss of data due to API error, another function called the st_distance() from sf will be used, the source of this code chunk is from the link shown here. (Source: LINK)
FOR FREQUENCY COUNT: For the purpose of this, I will considering the frequency count of the supermarkets etc WITHIN 1km radius which seems more realistic
The code chunk below is written to calculate the distance and frequency for each of the different locational factors
SUPERMARKET
library(units)
library(matrixStats)
radius <- 1000
HDB_resale_filtered_sf$Prox_supermarket <- 0
HDB_resale_filtered_sf$Freq_supermarket <- 0
dist1 <- st_distance(HDB_resale_filtered_sf, supermarket)
dist1 <- drop_units(dist1)
HDB_resale_filtered_sf$Prox_supermarket <-round(rowMins(dist1))
dist1_df <- as.data.frame(dist1)
HDB_resale_filtered_sf$Freq_supermarket <- rowSums(dist1_df <= radius)
HAWKER CENTRES
radius <- 1000
HDB_resale_filtered_sf$Prox_hawker <- 0
HDB_resale_filtered_sf$Freq_hawker <- 0
dist2 <- st_distance(HDB_resale_filtered_sf, hawker)
dist2 <- drop_units(dist2)
HDB_resale_filtered_sf$Prox_hawker <- round(rowMins(dist2))
dist2_df <- as.data.frame(dist2)
HDB_resale_filtered_sf$Freq_hawker <- rowSums(dist2_df <= radius)
CHILDCARE
radius <- 1000
HDB_resale_filtered_sf$Prox_childcare <- 0
HDB_resale_filtered_sf$Freq_childcare <- 0
dist3 <- st_distance(HDB_resale_filtered_sf, childcare)
dist3 <- drop_units(dist3)
HDB_resale_filtered_sf$Prox_childcare <- round(rowMins(dist3))
dist3_df <- as.data.frame(dist3)
HDB_resale_filtered_sf$Freq_childcare <- rowSums(dist3_df <= radius)
MRT & LRT
Frequency of MRT and LRT stations is not applicable
HDB_resale_filtered_sf$Prox_MRT_LRT <- 0
dist4 <- st_distance(HDB_resale_filtered_sf, MRT_LRT)
dist4 <- drop_units(dist4)
HDB_resale_filtered_sf$Prox_MRT_LRT <- round(rowMins(dist4))
glimpse(HDB_resale_filtered_sf)
Rows: 15,888
Columns: 20
$ month <chr> "2019-01", "2019-01", "2019-01", "2019-0~
$ town <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO"~
$ flat_type <chr> "4 ROOM", "4 ROOM", "4 ROOM", "4 ROOM", ~
$ block <chr> "204", "175", "543", "118", "411", "546"~
$ street_name <chr> "ANG MO KIO AVE 3", "ANG MO KIO AVE 4", ~
$ storey_range <chr> "01 TO 03", "07 TO 09", "01 TO 03", "04 ~
$ floor_area_sqm <dbl> 92, 91, 92, 99, 92, 92, 92, 92, 93, 91, ~
$ flat_model <chr> "New Generation", "New Generation", "New~
$ lease_commence_date <dbl> 1977, 1981, 1981, 1978, 1979, 1981, 1978~
$ remaining_lease <chr> "57 years", "61 years 06 months", "61 ye~
$ resale_price <dbl> 330000, 360000, 370000, 375000, 380000, ~
$ block_street <chr> "204+ANG+MO+KIO+AVE+3", "175+ANG+MO+KIO+~
$ geometry <POINT [m]> POINT (29179.92 38822.08), POINT (~
$ Prox_supermarket <dbl> 271, 310, 319, 459, 338, 313, 309, 380, ~
$ Freq_supermarket <dbl> 7, 4, 3, 4, 4, 3, 6, 4, 1, 1, 7, 8, 6, 1~
$ Prox_hawker <dbl> 442, 270, 258, 436, 70, 299, 229, 343, 2~
$ Freq_hawker <dbl> 5, 4, 2, 4, 3, 2, 5, 5, 2, 2, 3, 5, 5, 2~
$ Prox_childcare <dbl> 191, 135, 80, 190, 175, 159, 134, 73, 12~
$ Freq_childcare <dbl> 25, 19, 14, 18, 17, 13, 25, 19, 20, 15, ~
$ Prox_MRT_LRT <dbl> 704, 403, 890, 201, 887, 921, 501, 363, ~
With this, we have all the proximity leasures of the locational factors
Structural factors being considered are as follows: * Area of Unit * Remaining Lease
str(HDB_resale_filtered_sf)
sf [15,888 x 20] (S3: sf/tbl_df/tbl/data.frame)
$ month : chr [1:15888] "2019-01" "2019-01" "2019-01" "2019-01" ...
$ town : chr [1:15888] "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" ...
$ flat_type : chr [1:15888] "4 ROOM" "4 ROOM" "4 ROOM" "4 ROOM" ...
$ block : chr [1:15888] "204" "175" "543" "118" ...
$ street_name : chr [1:15888] "ANG MO KIO AVE 3" "ANG MO KIO AVE 4" "ANG MO KIO AVE 10" "ANG MO KIO AVE 4" ...
$ storey_range : chr [1:15888] "01 TO 03" "07 TO 09" "01 TO 03" "04 TO 06" ...
$ floor_area_sqm : num [1:15888] 92 91 92 99 92 92 92 92 93 91 ...
$ flat_model : chr [1:15888] "New Generation" "New Generation" "New Generation" "New Generation" ...
$ lease_commence_date: num [1:15888] 1977 1981 1981 1978 1979 ...
$ remaining_lease : chr [1:15888] "57 years" "61 years 06 months" "61 years 01 month" "58 years 04 months" ...
$ resale_price : num [1:15888] 330000 360000 370000 375000 380000 380000 385000 395000 395000 395000 ...
$ block_street : chr [1:15888] "204+ANG+MO+KIO+AVE+3" "175+ANG+MO+KIO+AVE+4" "543+ANG+MO+KIO+AVE+10" "118+ANG+MO+KIO+AVE+4" ...
$ geometry :sfc_POINT of length 15888; first list element: 'XY' num [1:2] 29180 38822
$ Prox_supermarket : num [1:15888] 271 310 319 459 338 313 309 380 693 779 ...
$ Freq_supermarket : num [1:15888] 7 4 3 4 4 3 6 4 1 1 ...
$ Prox_hawker : num [1:15888] 442 270 258 436 70 299 229 343 217 556 ...
$ Freq_hawker : num [1:15888] 5 4 2 4 3 2 5 5 2 2 ...
$ Prox_childcare : num [1:15888] 191 135 80 190 175 159 134 73 125 116 ...
$ Freq_childcare : num [1:15888] 25 19 14 18 17 13 25 19 20 15 ...
$ Prox_MRT_LRT : num [1:15888] 704 403 890 201 887 921 501 363 571 543 ...
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:19] "month" "town" "flat_type" "block" ...
It can be seen that remaining_lease is in character type and needs to be converted
HDB_resale_filtered_sf$remaining_lease <- as.numeric(gsub(".*?([0-9]+).*", "\\1", HDB_resale_filtered_sf$remaining_lease))
str(HDB_resale_filtered_sf)
sf [15,888 x 20] (S3: sf/tbl_df/tbl/data.frame)
$ month : chr [1:15888] "2019-01" "2019-01" "2019-01" "2019-01" ...
$ town : chr [1:15888] "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" ...
$ flat_type : chr [1:15888] "4 ROOM" "4 ROOM" "4 ROOM" "4 ROOM" ...
$ block : chr [1:15888] "204" "175" "543" "118" ...
$ street_name : chr [1:15888] "ANG MO KIO AVE 3" "ANG MO KIO AVE 4" "ANG MO KIO AVE 10" "ANG MO KIO AVE 4" ...
$ storey_range : chr [1:15888] "01 TO 03" "07 TO 09" "01 TO 03" "04 TO 06" ...
$ floor_area_sqm : num [1:15888] 92 91 92 99 92 92 92 92 93 91 ...
$ flat_model : chr [1:15888] "New Generation" "New Generation" "New Generation" "New Generation" ...
$ lease_commence_date: num [1:15888] 1977 1981 1981 1978 1979 ...
$ remaining_lease : num [1:15888] 57 61 61 58 59 61 58 62 60 60 ...
$ resale_price : num [1:15888] 330000 360000 370000 375000 380000 380000 385000 395000 395000 395000 ...
$ block_street : chr [1:15888] "204+ANG+MO+KIO+AVE+3" "175+ANG+MO+KIO+AVE+4" "543+ANG+MO+KIO+AVE+10" "118+ANG+MO+KIO+AVE+4" ...
$ geometry :sfc_POINT of length 15888; first list element: 'XY' num [1:2] 29180 38822
$ Prox_supermarket : num [1:15888] 271 310 319 459 338 313 309 380 693 779 ...
$ Freq_supermarket : num [1:15888] 7 4 3 4 4 3 6 4 1 1 ...
$ Prox_hawker : num [1:15888] 442 270 258 436 70 299 229 343 217 556 ...
$ Freq_hawker : num [1:15888] 5 4 2 4 3 2 5 5 2 2 ...
$ Prox_childcare : num [1:15888] 191 135 80 190 175 159 134 73 125 116 ...
$ Freq_childcare : num [1:15888] 25 19 14 18 17 13 25 19 20 15 ...
$ Prox_MRT_LRT : num [1:15888] 704 403 890 201 887 921 501 363 571 543 ...
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:19] "month" "town" "flat_type" "block" ...
With that, we can conclude the data wrangling process and proceed to the next section
The distribution of the independent variables is shown using histogram
AREA_SQM <- ggplot(data=HDB_resale_filtered_sf, aes(x= `floor_area_sqm`)) +
geom_histogram(bins=20, color="black", fill="light blue")
Remaining_Lease <- ggplot(data=HDB_resale_filtered_sf, aes(x= `remaining_lease`)) +
geom_histogram(bins=20, color="black", fill="light blue")
Prox_supermarket1 <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Prox_supermarket`)) +
geom_histogram(bins=20, color="black", fill="light blue") +
xlim(0, 3000)
Prox_hawker1 <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Prox_hawker`)) +
geom_histogram(bins=20, color="black", fill="light blue")+
xlim(0, 3000)
Prox_childcare1 <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Prox_childcare`)) +
geom_histogram(bins=20, color="black", fill="light blue")+
xlim(0, 1000)
Prox_MRT_LRT1 <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Prox_MRT_LRT`)) +
geom_histogram(bins=20, color="black", fill="light blue")+
xlim(0, 3000)
Freq_supermarket <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Freq_supermarket`)) +
geom_histogram(bins=20, color="black", fill="light blue")
Freq_hawker <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Freq_hawker`)) +
geom_histogram(bins=20, color="black", fill="light blue")
Freq_childcare <- ggplot(data=HDB_resale_filtered_sf, aes(x= `Freq_childcare`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(AREA_SQM, Remaining_Lease, Prox_supermarket1, Prox_hawker1, Prox_childcare1, Prox_MRT_LRT1, Freq_supermarket, Freq_hawker, Freq_childcare)
tmap_mode("view")
tm_shape(mpsz)+
tm_polygons() +
tm_shape(HDB_resale_filtered_sf) +
tm_dots(col = "resale_price",
alpha = 0.6,
style="quantile")